DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs
117
TABLE 4.4
Results of the comparison on the ImageNet dataset with DCP-NAS
of the distance calculation method used to constrain the gradient of
binary NAS in the tangent direction, i.e., Eq. 4.31. We use the small
size of the model, that is, DCP-NAS-S, to evaluate the searched
architecture.
Method
Accuracy(%)
Memory (MBits)
Search Cost
Top1
Top5
Cosine similarity
62.5
83.9
4.2
2.9
L1-norm
62.7
84.3
4.3
2.9
F-norm
63.0
84.5
4.2
2.9
much smaller performance gap between real-valued NAS with a lower search cost by a clear
margin. We conduct ablative experiments for different architecture discrepancy calculation
methods to further clarify the tangent propagation. As shown in Table 4.4, F-norm applied
in Eq. 4.31 achieves the best performance, while the cosine similarity and the L1-norm are
not as effective as the F-norm.